A trait database and updated checklist for European subterranean spiders

Species traits are an essential currency in ecology, evolution, biogeography, and conservation biology. However, trait databases are unavailable for most organisms, especially those living in difficult-to-access habitats such as caves and other subterranean ecosystems. We compiled an expert-curated trait database for subterranean spiders in Europe using both literature data (including grey literature published in many different languages) and direct morphological measurements whenever specimens were available to us. We started by updating the checklist of European subterranean spiders, now including 512 species across 20 families, of which at least 192 have been found uniquely in subterranean habitats. For each of these species, we compiled 64 traits. The trait database encompasses morphological measures, including several traits related to subterranean adaptation, and ecological traits referring to habitat preference, dispersal, and feeding strategies. By making these data freely available, we open up opportunities for exploring different research questions, from the quantification of functional dimensions of subterranean adaptation to the study of spatial patterns in functional diversity across European caves.

Taxonomic and geographical coverage. We initially updated the checklist of European subterranean spiders provided in ref. 29 to obtain a list of species for which to collect relevant traits. Specifically, we: I). included new species described after 2017; II). included species that we had overlooked in the previous version of the checklist-see, e.g., missing species pointed out in ref. 33 ; and III). updated taxonomy following recent nomenclatural changes 34 and corrected a few mistakes we detected.
Concerning the geographical coverage of the checklist, we considered all European countries as defined in the Spiders of Europe 35 database. However, we excluded North Africa, because it was only recently included in the Spiders of Europe, and oceanic islands (Azores, Madeira, and Canary Islands), because their insularity may lead to different processes shaping regional diversity 36 . Trait collection. For each species, we collected traits using literature data (mainly taxonomic descriptions) and, whenever specimens were available to us, direct measurements (27%; n = 137 species) (Fig. 1). We retrieved literature primarily from the World Spider Catalog 34 -i.e., the main repository of bibliography on the taxonomy of spiders 37 -and secondarily from the Spiders of Europe repository and Google Scholar. We used the latter two sources to retrieve most literature on ecological traits.
Morphological measures. We collected several traits that different authors have considered proxies for body size, food specialization, and subterranean adaptation [38][39][40][41] (Table 2). All length measures are given in millimetres.
For measured species, we used averaged values whenever multiple specimens were available to us. If total body length was not reported in the original description, we approximated it as the sum of prosoma + opisthosoma. Also, in a few descriptions, authors reported tibia length as the sum tibia + patella. In such cases, we approximated tibia length as a fraction of the value based on the ratio tibia/patella in congeneric species.
In six-eyed spider families (Dysderidae, Leptonetidae, Sicariidae, Symphytognathidae, and Telemidae), for which a pair of eyes is missing due to ontological reasons unrelated to subterranean adaptation, we assumed the missing pair of eyes to be AME 42 . If a paper reported the missing pair of eyes to be ALE, we ignored the

Term (acronym) Definition used in this paper
Functional diversity (FD) Any measure of the diversity of traits of organisms composing a group, such as a community or an ecosystem 12 Terrestrial subterranean habitat/ecosystem All the subterranean spaces harbouring species showing traits typical to subterranean life. These include humanaccessible natural subterranean spaces (i.e., caves), network of fissures with sizes smaller than the human scale, and artificial subterranean habitats (e.g., mines, blockhouses, cellars) 101

Trait
For the purpose of the paper, traits are intended in the broad sense of the World Spider Trait database, namely any phenotypic entity (e.g., morphological, anatomical, ecological, physiological, behavioural) measured at the species level 10,32 . In general, all traits are regarded as functional in that they are products of evolution and thus potentially linked to individual fitness 103 . However, almost always, the functional connotation of a trait is inferred based on indirect evidence Table 1. Glossary of specialized term used.  Table 2  Habitat and ecological traits. We classified ecological traits (Table 2) based on ref. 43 . We included functional guild, foraging strategy (type of web and method of active hunting), and prey range (specialist or generalist). Conversely, we excluded vertical stratification (ground or vegetation) and circadian activity (diurnal or nocturnal), as these are not relevant for subterranean ecosystems. Instead, we classified vertical stratification in a cave (ground, wall, or both) and potential for long-range dispersal outside subterranean habitats (e.g., ballooning in Meta spiders 44 or active dispersal on the ground in Pimoa 45 ). In subterranean species, body size is possibly related to habitat (pore) size 104 . Difference in size between females and males may provide indirect information on sexual selection mechanisms operating in subterranean habitats.

Leg (and Leg elongation)
Femur I and tibia I length for females and males, and the average of males and females. Femur and tibia elongation is further calculated by dividing the average length and body size.
Leg length is a proxy for overall body size 105 . In subterranean spiders, leg length is often related with habitat (pore) size 106 and leg elongation preferentially occurs in subterranean species 107 .
Prosoma (size and shape) Prosoma length, width, and height for females and males.
A proxy for overall body size 108 . Shape may vary according to different microhabitats. In certain species, prosoma height is hypothesized to be a proxy measure of subterranean adaptation-i.e., flattening of the prosoma profile with increasing adaptation 39,107 .
Cheliceral fang Fang length for females and males. The dimension of cheliceral fangs provides information on dietary requirements 104 .

Clypeus
Clypeus height for females and males. Same as prosoma height.

Eyes
Diameter of AME, ALE, PME, and PLE. Distance AME-ALE and PME-PLE. Note that a variable "AME_type" describes whether AME are present or missing due to either subterranean adaptation or ontology (six-eyed families; see main text).
In spiders, eye regression is among the most evident morphological change to the subterranean conditions 26 .
Regression of different groups of eyes provide indication for different degree of adaptation. For example, in Troglohyphantes, the anterior median eyes are usually the first undergoing regression 106 .

Morphology (Categorical)
Eyes Binary variables (0 = no; 1 = yes) indicating whether the species has regressed eyes or is eyeless (non-functional eyes). Note that a species can both have regressed eyes and eyeless status when different population exhibit different degrees of eye regression.
Mainly literature data (original description or re-descriptions). Literature used is reported in the column "Citation".

Pigmentation
Ordinal variables, indicating whether the species is pigmented, variable, partly pigmented, depigmented.
In spiders, with the adaptation to the subterranean conditions, body pigment is generally the first morphological character to get lost 26 .

Guild
Categorical variable indicating the general functional guild of each spider: Ambush, Ground, Orb, Other, Sensing, Sheet, Space, Sheet-space, or Specialist. Note that the guild 'Sheet-space' is not originally coded in Cardoso et al. 102 . It has been introduced for Pholcidae based on the expert opinion of BH.
Based on literature data and/or our expert opinion.
A general summary of the hunting ecology of each species 43 .

Hunting strategy
Binary variables (0 = no; 1 = yes) indicating the species web strategy (Capture web, Sensing web, and no web). For each species, we also indicated the type of web if any (Tube web, Sheet web, Space web, Orb Web) and/or the type of active hunting strategy if any (Ambush hunter or Active hunter).
Spiders are important predators in caves; different types of hunting strategies may be associated to different microhabitats. Furthermore, the subterranean environment selects for specific hunting strategies 43 .
Diet (Food specialist) Binary variables (0 = no; 1 = yes) indicating whether the species is a food specialist or not.
Food specialisation is thought to be rare in subterranean communities given the general scarcity of food 26 . However, food specialisation seems to be retained in a few species (e.g., Dysderidae 41 ) and may be associated with niche differentiation to avoid direct competition 109 .

Dispersal
Binary variables (0 = no; 1 = yes) indicating whether the species can perform long range dispersal outside the cave.
Long range dispersal is rare in subterranean species, and may be only found in generalist species with limited affinity to subterranean habitats.
Habitat preference

Ecological classification
Categorical variable indicating whether the species is a Troglobiont or a Troglophile (see section "Ecological classification" for a definition).
Based on literature data and/or our expert opinion.
Gives a rough indication of the level of dependency of each species to the subterranean medium. See section "Ecological classification" for some cautionary arguments.

Alien status
Binary variable (0 = no; 1 = yes) indicating whether the species is considered an alien species in Europe or not (sensu ref. 43 ).
Subterranean habitats are thought to be poorly permeable to invasion by alien species 49 . Still a few alien elements have been documented, especially in disturbed habitats (e.g. mines)-see overview and discussion in ref. 110 .

Habitat
Binary variables (0 = no; 1 = yes) indicating whether the species occur in Deep caves, at Cave entrances, in SSHs, or in External habitats. Note that a single species can occur in multiple of these.
Gives a rough indication of the type of subterranean habitats occupied by each species. The ability of a species to occupy multiple habitats provide indication on its general plasticity.

Verticality
Categorical variable indicating whether a species in a cave preferentially dwell on the ground, on the walls, or both. Note that a single species can occur in multiple of these.
In a typical subterranean community, different species are often adapted to different microhabitats. In spiders, for example, there can be a niche differentiation between wall and soil-dwelling species 111 . www.nature.com/scientificdata www.nature.com/scientificdata/ Ecological classification. In the previous version of the checklist of European subterranean spiders, we also reported an indication of the level of affinity of each species to the subterranean medium 46,47 . This was assessed by the group expert involved in this paper or taken from literature (whenever this information was mentioned in the original description), and included two categories: i) Troglophile, for species able to maintain stable subterranean populations or inclined to inhabit subterranean habitats, being, however, associated with surface habitats for some biological functions or able to maintain surface populations too; and ii) Troglobiont, for species strictly bound to subterranean habitats. For consistency, we included this ecological classification in this update of the checklist and in the trait database. However, since definition of ecological categories is traditionally a stumbling stone of biospeleology 48,49 , and sparkled some debate in the form of personal communications, we would like to clarify its real meaning. The attribution of some species to one category or another may be problematic as this is not a strictly categorical trait but often can be seen as a continuum from troglobionts to surface dwellers-including the intermediate troglophiles. As we see it, this is just a practical tool that allows one to roughly subdivide groups of species in broad macro-categories. The proper way for assessing the species affinity for subterranean or surface habitats would be a systemic survey including extensive sampling primarily in the surface habitats, population studies, and a robust phylogenetic framework 47 , all of which are practically non-existent for most subterranean spiders. There are, however, alternative ways to do so depending on the research questions of interest 50 . For example, one can by-pass this classification and simply use morphological traits such as eye regression, leg elongation, and pigmentation as a proxy for the subterranean specialization of each species.

Data visualisation.
To illustrate the usage of the dataset, we plotted the distribution of key traits as density plot with the R library 'ggplot2' 51 . We also generated a representation of the trait space for European subterranean spiders showing its general organisation and the position of each spider family within it. To this end, we selected a subset of traits from the whole trait matrix, representing: I). General morphology of species (Average body size, Sexual size dimorphism, and Prosoma shape); II). Morphological adaptation to subterranean condition, including both categorical (Pigmentation, Presence/ absence of Eyes, Eye regression, Leg elongation, AME, ALE, PME, and PLE, and AME type) and continuous (Femur elongation and Profile reduction) traits; III). Hunting strategy (all binary variables referring to hunting strategy and diet, as well as the continuous variable Fang length); IV). Dispersal behaviour (Dispersal); and V). Microhabitat occupation (Verticality).   www.nature.com/scientificdata www.nature.com/scientificdata/ We performed data exploration following recommendations in ref. 52 , checking variable distribution, multicollinearity among continuous traits via Pearson's r correlations, and presence of missing data. As a result of data exploration, we excluded Fang length and Profile reduction because they contained more than 80% of missing values. To homogenize variable distribution, we log-transformed all continuous variables that do not assume negative values. We also standardized all continuous traits to mean = 0 standard deviation = 1 to ensure comparable ranges among traits.
Since the trait matrix contains both continuous, binary, and categorical variables, we used a Gower distance to estimate trait dissimilarity among species 53 . Because different traits span different functional roles, we used an optimisation method to attribute weight to traits within groups 54 . To this end, we assigned traits to the five groups of variables as defined above.
We visualized the trait space as the first two axes of a principal coordinate analysis using the trait dissimilarity matrix as input data, using the R package 'ape' version 5.5.0 55 . For graphical visualisation, we estimated density of species onto the ordination diagram using a kernel density. Furthermore, we visualized the centroid of each family to get an overview of the spatial relationships among families within the trait space. To relate traits to ordination axes, we used the function envfit from the R package 'vegan' version 2.5.7 56 . This function calculates a multiple linear regression of the traits (dependent variable) and species scores on ordination axes (independent variables). The normalized regression coefficients multiplied by the square root of the coefficient of determination are used to position the trait onto the ordination diagram. Note that this analysis was only possible for complete cases-i.e., species without missing traits (N = 154). We performed all analyses in R version 4.1.0 57 .

Data Records
The trait database is available in Figshare 58 as a tab-delimited file (.csv) and in Excel (.xlsx) format. Traits missing from the World Spider Trait database were also deposited therein-accessible directly in R environment using the function traits in the R package 'arakno' 59 .
Detailed explanation of traits, including their hypothesized functional meaning, is given in Table 2. The dataset consists of 64 traits (some examples of trait distributions are given in Fig. 2) for 520 species belonging to 20 families associated with caves (Table 3)-34 species more than in the previous checklist 29 . The family comprising most species is Linyphiidae (224 species, almost half of them belonging to a single genus Troglohyphantes), followed by Dysderidae (62 species), Leptonetidae (60), Nesticidae (56), and Agelenidae (43). All these families comprise several specialized species only found in subterranean habitats and showing traits such as full eye regression and complete depigmentation, but also generalist species exhibiting a low degree of morphological specialisation to subterranean life (Fig. 3). The remaining families are all represented by up to 30 species and encompass spiders with diverse levels of subterranean specialisation. We refer the reader to ref. 29 . for an in-depth taxonomic and biogeographical account.

technical Validation
There are some limitations that one must be aware of when using the dataset: I). Given the low availability of specimens-many of these species have been collected at the time of their description and never recorded thereafter-the dataset is a mixture of literature data and direct measures. It includes families for which we have been able to measure all species (Pholcidae) and others for which over half of species traits are derived from original description and other literature sources (e.g., Linyphiidae and Leptonetidae). Unfortunately, many original descriptions, both recent and old, contain poor information, limiting the possibility to extract traits. II). For the same reason, there is a high frequency of missing data for some traits and species. This means that one may want to focus on traits that are well sampled and use a reduced matrix of only well-sampled traits in community-level analyses. There are statistical ways to partly remedy these problems. Different imputation methods can be used to infer missing trait values 60,61 . Most of these imputation tools are implemented in the function fill in the R package 'BAT' 62,63 . Also, in community-level analyses, one can use functional distance measures able to accommodate missing data, especially Gower distances 54 . The latter method is the one we used to generate the trait space in Fig. 4. III). Given the scale of the dataset and the lack of multiple specimens for most species considered, this dataset do not contain information on intraspecific variability-one exception being the minimum-maximum range for body size. It is well known that intraspecific trait variability is an important aspect of community ecology 64,65 , which can be pronounced in many taxa 66,67 . This is seemingly true also for subterranean spiders. For example, in the well-studied case of Western Alpine Troglohyphantes (Linyphiidae), intraspecific variability has been reported for morphological traits relating to subterranean adaptation 38,39,68 , but also in individual thermal tolerance 68 . Likewise, individuals of Kryptonesticus eremita (Simon, 1880) (Nesticidae) may show different levels of pigmentation depending on how far from the cave entrance they are collected 69 . Accordingly, any analysis based on this database must be taken as an average representation of the process under study, and the information relativized accordingly 70 .

Usage Notes
Complementarity with other databases. The  www.nature.com/scientificdata www.nature.com/scientificdata/ some species depending on the latest nomenclature changes. This can be done automatically using the function checknames in the R package 'arakno' 59 , which checks for nomenclature changes, synonyms, and spelling errors dialoguing with the most up-to-date version of the World Spider Catalog 34 .

Example of trait-based research questions in subterranean biology.
As old as the recognition of the bizarre morphology of cave species is the search for ecological and evolutionary explanations for these unique adaptations 72 . Trait-based ecology is a critical framework to this end 73 . By focusing on how traits interact mechanistically with environments across spatial scales and levels of organization, we can use geographically ubiquitous and ecologically diverse cave spiders to test hypotheses in subterranean biology and beyond 22 . Here, we provide some examples of avenues of research, hoping to both stimulate re-use of the dataset and the quest for developing similar databases for both spiders outside Europe and for other subterranean taxa.
Quantifying functional redundancy and subterranean specialization. Different spider species and families occupy distinct regions of the trait space (Fig. 4). The position of the species in the trait space can be mapped to obtain a quantification of their functional redundancy (e.g., if multiple species fall within highly sampled areas of the trait space). Within a given group (e.g., family or genus), one could also rank species according to their degree of adaptation by using traits relating to subterranean adaptation and calculating the functional distance of each troglobiont species from the average troglophile species or the closest surface species 50 -following the saying "nothing [makes] sense in speleobiology without a comparison of cave animals with the 'normal' epigean ones" 74 . Ultimately, the quantification of the level of subterranean specialization of species on a continuous scale allows us to explore the degree to which the specialization of a given community relates to local environmental conditions, interspecific interactions, and more.
Trait-based (macro)ecology of subterranean spiders. Traits are a useful aid for answering a range of questions in community ecology and macroecology 12,15,16,75 . To what extent is there functional convergence in the functional space of subterranean spider communities in a given region? Do different microhabitats within a cave select for functionally unique spiders? What is the maximum degree of functional similarity that allows two or more  www.nature.com/scientificdata www.nature.com/scientificdata/ species to occupy the same environment? How does the functional space of a given subterranean community change after a perturbation event (e.g., the extinction of some species, the invasion by a non-native species)?
Similar questions can be answered using metrics such as community weighted trait means 76 or more advanced ways to calculate the functional richness, dispersion, and regularity of the trait space occupied by a given community (e.g., functional dendrograms 77 or probabilistic hypervolumes 78 ). We refer the reader to recent accounts on functional diversity analyses for operation details about similar analyses 12,13 . Also, all these questions can be explored at different scales, from local communities inhabiting a single cave or cave system up to entire karst areas and even continents. This latter possibility is enhanced by the availability of broad-scale distribution and community composition data for European cave spiders 30 . For example, a recent study demonstrated that there is a quick turnover in the taxonomic diversity of subterranean spiders across Europe, mediated primarily by the geographic distance among caves and secondarily by the climatic conditions and availability of karst habitat, ensuring cave connectivity 79 . The usage of traits enables us to test whether the same distance decay occurs with respect to functional diversity, or if taxonomically distinct communities in caves can fulfil similar functional roles thereby determining a lower turnover in functional diversity in Europe. Whereas it is well-known that taxonomic and functional diversity decays at different rates along geographical and environmental gradients across different terrestrial and marine habitats and organisms 80 , similar patterns have never been explored in subterranean habitats.
Trait-based conservation of subterranean spiders. Species traits can be useful in conservation science, for example to assess species extinction risk, to prioritize species and habitat for conservation, and ultimately to define long-term conservation strategies 81 .
At the individual level, there has been recent interest in understanding the relationship between species traits and extinction risk, namely whether species possessing specific traits (e.g. larger body size, greater longevity) are more prone to extinction 82,83 . To the best of our knowledge, similar considerations have never been applied to subterranean species, let alone spiders.
At the community level, one can identify species with unique and original traits ('functional outliers' sensu ref. 84 ) versus species falling within densely populated regions of the trait space. This enables the possibility to map the extinction risk across a given global functional spectrum (e.g., the functional space of European cave spiders in Fig. 4) and ultimately to provide general guidance of where to focus in the search for priority species for conservation 85 . The rationale behind this possibility is that functionally unique species are often www.nature.com/scientificdata www.nature.com/scientificdata/ irreplaceable, whereas the ecological role of functionally redundant species can be performed by functionally analogous species in the community ('biological insurance' sensu ref. 86 ).
Historically, subterranean ecosystems have been overlooked in global biodiversity conservation agendas 87 . In recent years, as the conservation importance of subterranean ecosystems is being reaffirmed, there is a growing need to develop objective ways to prioritize subterranean species and regions to protect. There are several examples of studies proposing operational indexes targeting top-priority caves or subterranean sites for protection given a scenario of limited resources invested in conservation [88][89][90] . These high-priority sites usually end up corresponding with so-called "hotspots of subterranean diversity 91 ". However, in our view, all these prioritization attempts fail short on one key aspect: they only consider number of species and/or endemism in their protocol to design protected areas or conservation priorities-but see, e.g., ref. 92,93 . A modern take on this subject would be to not only consider taxonomic diversity and relative measures, but also to maximize phylogenetic and functional diversity within a given protected area 94,95 , and even the extent to which species niches are accounted www.nature.com/scientificdata www.nature.com/scientificdata/ for 96 . A trait dataset such as the one released in this work is a first, necessary step towards the goal of obtaining a multi-pronged prioritization that accounts for multiple biodiversity facets 97 . This is of the utmost importance given the current threats on subterranean ecosystems, and the unique conservation challenges associated with these biota [98][99][100] .